Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support default checksums #1191

Merged
merged 37 commits into from
Jan 15, 2025
Merged

feat: support default checksums #1191

merged 37 commits into from
Jan 15, 2025

Conversation

0marperez
Copy link
Contributor

Issue #

Description of changes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@0marperez 0marperez added the no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly. label Nov 27, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

@0marperez
Copy link
Contributor Author

Note to reviewers: I'll address the breaking API changes and failing protocol issues soon. In the meantime, I’d appreciate a review of the checksum-related changes.

Comment on lines 29 to 31
* Calculates a request's checksum.
*
* If the checksum will be sent as a header, calculate the checksum.
* If a user supplies a checksum via an HTTP header no calculation will be done. The exception is MD5, if a user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness/clarification: "Calculates a request's checksum" and "If a user supplies a checksum via an HTTP header no calculation will be done." conflict with each other

*
* @param requestChecksumRequired Model sourced flag indicating if checksum calculation is mandatory.
* @param requestChecksumCalculation Configuration option that determines when checksum calculation should be done.
* @param userSelectedChecksumAlgorithm The checksum algorithm that the user selected for the request, may be null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: requestChecksumAlgorithm

private val forcedToCalculateChecksum = requestChecksumRequired || requestChecksumCalculation == HttpChecksumConfigOption.WHEN_SUPPORTED
private val checksumHeader = StringBuilder("x-amz-checksum-")
private val defaultChecksumAlgorithm = lazy { Crc32() }
private val defaultChecksumAlgorithmHeaderPostfix = "crc32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postfix -> Suffix

* The header must start with "x-amz-checksum-" followed by the checksum algorithm's name.
* MD5 is not considered a valid checksum algorithm.
*/
private fun userProviderChecksumHeader(request: HttpRequest, logger: Logger): String? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: can be restructured as an extension function HttpRequest.checksumHeader(logger: Logger): String?

and naming: userProvidedChecksumHeader

* Users can check which checksum was validated by referencing the `ResponseChecksumValidated` execution context variable.
*
* @param shouldValidateResponseChecksumInitializer A function which uses the input [I] to return whether response checksum validation should occur
* @param responseValidationRequired Flag indicating if the checksum validation is mandatory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: "Model sourced flag"

.removePrefix("x-amz-checksum-")
.toHashFunction() ?: throw ClientException("Could not parse checksum algorithm from header $checksumHeader")

if (context.protocolResponse.body is HttpBody.Bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this branch added? The spec says we should delay checksum calculation until the body is consumed:

Where possible, SDKs MUST defer this calculation and validation until the payload is actually consumed by the user i.e. a payload must not be read twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toHashingBody doesn't support Bytes bodies. I decided to calculate the checksum in memory instead of adding support for Bytes to toHashingBody. It should be safe if the request is not being streamed but I think you're right that we're not following the spec

Copy link
Contributor

@lauzadis lauzadis Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toHashingBody doesn't support Bytes bodies

Does it need to? The previous implementation never seemed to need it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does, the spec has some unit tests with non-streaming bodies

Copy link
Contributor Author

@0marperez 0marperez Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline and we'll be removing support for HttpBody.Bytes from FlexibleChecksumsResponseInterceptor because HTTP bodies are never bytes. Source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing support for HttpBody.Bytes is a bad idea. Our current implementations of OkHttp and CRT engines do not return bytes, that's true, but future implementations or other engines wrapped by users may. In addition, interceptors may be used to rewrite response bodies by streaming them into memory, processing them, and then rewriting the body in a new form (possibly HttpBody.Bytes). I think we need to ensure each type of HttpBody is handled correctly.

@@ -126,23 +133,6 @@ class FlexibleChecksumsRequestInterceptorTest {
assertEquals(0, call.request.headers.getNumChecksumHeaders())
}

@Test
fun itSetsChecksumHeaderViaExecutionContext() = runTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're no longer setting the checksum header via execution context

@@ -163,29 +168,4 @@ class FlexibleChecksumsResponseInterceptorTest {

op.roundTrip(client, TestInput("input"))
}

@Test
fun testSkipsValidationWhenDisabled() = runTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be covered under ChecksumConfigTest > ResponseChecksumValidation but I added it back


public enum class HttpChecksumConfigOption {
/**
* SDK will create/validate checksum if the service marks it as required or if this is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: create -> calculate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: missing tests for when requestChecksumRequired = false and requestChecksumCalculation = HttpChecksumConfigOption.WHEN_REQUIRED. Same for response validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request test cases are already here. I think the response validation unit tests need one more to be exhaustive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests in ChecksumConfigTest.kt aren't validating interceptor behavior like the tests in this file do.

* @param checksumAlgorithmNameInitializer an optional function which parses the input [I] to return the checksum algorithm name.
* if not set, then the [HttpOperationContext.ChecksumAlgorithm] execution context attribute will be used.
* If the request will be streamed:
* - The checksum calculation is done asynchronously using a hashing & completing body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "asynchronously" → "during transmission"

* - The checksum will be sent in a trailing header, once the request is consumed.
*
* If the request will not be streamed:
* - The checksum calculation is done synchronously
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "synchronously" → "before transmission"

Comment on lines 60 to 75
private val defaultChecksumAlgorithm = lazy { Crc32() }
private val defaultChecksumAlgorithmHeaderPostfix = "crc32"

private val checksumAlgorithm = userSelectedChecksumAlgorithm?.let {
val hashFunction = userSelectedChecksumAlgorithm.toHashFunction()
if (hashFunction == null || !hashFunction.isSupported) {
throw ClientException("Checksum algorithm '$userSelectedChecksumAlgorithm' is not supported for flexible checksums")
}
checksumHeader.append(userSelectedChecksumAlgorithm.lowercase())
hashFunction
} ?: if (forcedToCalculateChecksum) {
checksumHeader.append(defaultChecksumAlgorithmHeaderPostfix)
defaultChecksumAlgorithm.value
} else {
null
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Seems unnecessary to declare forcedToCalculateChecksum, defaultChecksumAlgorithm, and defaultChecksumAlgorithmHeaderPostfix as fields only to use them in one if branch. Could this just be:

?: if (requestChecksumRequired || requestChecksumCalculation == HttpChecksumConfigOption.WHEN_SUPPORTED) {
    checksumHeader.append("crc32")
    Crc32()
} else

Comment on lines 80 to 81
userProviderChecksumHeader(context.protocolRequest, logger)?.let {
logger.debug { "User supplied a checksum via header, skipping checksum calculation" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: These log messages which talk about the user feel awkward. Log messages are for users so it seems strange to mention the user in the third person. I'd suggest rewording these to be more about data/actions and less about the actors involved (e.g., "Found a provided checksum in the request, skipping checksum calculation").


deferredChecksum.complete(checksum)
} else {
logger.debug { "Calculating checksum asynchronously" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We've lost the log message that we're calculating a checksum asynchronously during transmission. I'd suggest moving your "Calculating checksum using '$checksumAlgorithm'" message into the if/else clauses, tweaking each to mention whether the checksum is being calculated before or during transmission.

Comment on lines 159 to 163
private fun String.isCompositeChecksum(): Boolean {
// Ends with "-#" where "#" is a number between 1-1000
val regex = Regex("-([1-9][0-9]{0,2}|1000)$")
return regex.containsMatchIn(this)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: When this logic is made S3 specific, we shouldn't assert on the part number fitting inside a certain range. Just "-(\d)+$" should suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please add tests for HttpChecksumConfigOption.WHEN_REQUIRED. Applies to response interceptors as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request test cases are already here. I think the response validation unit tests need one more to be exhaustive.

Comment on lines -129 to -144
@Test
fun itSetsChecksumHeaderViaExecutionContext() = runTest {
checksums.forEach { (checksumAlgorithmName, expectedChecksumValue) ->
val req = HttpRequestBuilder().apply {
body = HttpBody.fromBytes("<Foo>bar</Foo>".encodeToByteArray())
}

val op = newTestOperation<Unit, Unit>(req, Unit)
op.context[HttpOperationContext.ChecksumAlgorithm] = checksumAlgorithmName
op.interceptors.add(FlexibleChecksumsRequestInterceptor<Unit>())

op.roundTrip(client, Unit)
val call = op.context.attributes[HttpOperationContext.HttpCallList].first()
assertEquals(expectedChecksumValue, call.request.headers["x-amz-checksum-$checksumAlgorithmName"])
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we still use HttpOperationContext.ChecksumAlgorithm or can it be deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still used for MD5 checksums in the httpChecksumRequiredTrait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on this, it's no longer used anywhere so we can deprecate/remove it

Comment on lines 167 to 190
@Test
fun testSkipsValidationWhenDisabled() = runTest {
val req = HttpRequestBuilder()
val op = newTestOperation<TestInput>(req)

op.interceptors.add(
FlexibleChecksumsResponseInterceptor<TestInput> {
false
},
)

val responseChecksumHeaderName = "x-amz-checksum-crc32"

val responseHeaders = Headers {
append(responseChecksumHeaderName, "incorrect-checksum-would-throw-if-validated")
}

val client = getMockClient(response, responseHeaders)

val output = op.roundTrip(client, TestInput("input"))
output.body.readAll()

assertNull(op.context.getOrNull(ChecksumHeaderValidated))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why did we delete this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be covered under ChecksumConfigTest > ResponseChecksumValidation but I added it back

Comment on lines 31 to 34
/**
* SDK will create/validate checksum if the service marks it as required or if this is set.
*/
WHEN_SUPPORTED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "or if this is set" → "or if the service offers optional checksums"

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@0marperez 0marperez marked this pull request as ready for review December 4, 2024 20:53
@0marperez 0marperez requested a review from a team as a code owner December 4, 2024 20:53

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Comment on lines 28 to 29
// FIXME: Re-enable. This test is broken after a smithy update: https://github.com/smithy-lang/smithy/pull/2467
// ProtocolTest("aws-json-10", "aws.protocoltests.json10#JsonRpc10"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I know we talked about disabling some tests that are failing in Smithy 1.53.0, but these are disabling the entire test suite, which we don't want to do

@@ -32,7 +33,7 @@ import software.amazon.smithy.rulesengine.traits.EndpointTestsTrait
* shape's closure for example)
*/
@Suppress("EXTENSION_SHADOWED_BY_MEMBER")
inline fun <reified T : Shape> Model.shapes(): List<T> = shapes(T::class.java).toList()
inline fun <reified T : Shape> Model.shapes(): List<T> = shapes(T::class.java).kotlinToList()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has never been a problem before, what changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a refactor in AWS SDK Kotlin to make the codegen tests KMP, which led to the JVM CI checks failing because we've been using JVM 17. I set the JVM version in the tests to 1.8 and the toList function used above is only available since JVM 16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes sense. You should be able to use shapes(T::class.java).collect(Collectors.toList()) instead of hacking around with the Kotlin toList

import software.amazon.smithy.model.traits.HttpChecksumRequiredTrait

/**
* Handles the `httpChecksumRequired` trait.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/docs: Explain more about how it "handles" the trait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "meat" of the integration is the middleware and both of them have their own Kdocs. It feels like docs overkill to also add a summary of what each middleware is doing here

writer.write(
"op.context[#T.DefaultChecksumAlgorithm] = #S",
RuntimeTypes.HttpClient.Operation.HttpOperationContext,
"MD5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/correctness: Can/should we store this as a HashFunction rather than a String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, I think we would have to initialize the HashFunction in the context and its seems slightly worse than what we have right now. Thoughts?

Comment on lines 34 to 39

/**
* @return The default checksum algorithm name, null if default checksums are disabled.
*/
internal fun defaultChecksumAlgorithmName(context: ProtocolRequestInterceptorContext<Any, HttpRequest>): String? =
context.executionContext.getOrNull(HttpOperationContext.DefaultChecksumAlgorithm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: This seems like the wrong place to define this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have said HttpChecksumRequiredInterceptor but I see it's also used by FlexibleChecksumsRequestInterceptor. I think it's fine here.

style: make it an extension val

internal val ProtocolRequestInterceptorContext<Any, HttpRequest>.defaultChecksumAlgorithmName: String?
    get() = executionContext.getOrNull(HttpOperationContext.DefaultChecksumAlgorithm)

Comment on lines +196 to +201
/**
* Convert an [HttpBody] with an underlying [HashingSource] or [HashingByteReadChannel]
* to a [CompletingSource] or [CompletingByteReadChannel], respectively.
*/
@InternalApi
public fun HttpBody.toCompletingBody(deferred: CompletableDeferred<String>): HttpBody = when (this) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These used to be private, now they are @InternalApi public, meaning they are included in our API dumps and part of backwards compatibility. Was this move necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I moved this here because I was going to add support for streaming checksums in the HttpCheksumsRequired interceptor

*/
@InternalApi
public val HashFunction.isSupportedForFlexibleChecksums: Boolean get() =
algorithmsSupportedForFlexibleChecksums.contains(this::class.simpleName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: Relying on this::class.simpleName seems unsafe.

Better solution:

@InternalApi
public val HashFunction.isSupportedForFlexibleChecksums: Boolean 
    get() = when (this) {
        is Crc32, is Crc32c, is Sha1, is Sha256 -> true
        else -> false
    }

Copy link
Contributor Author

@0marperez 0marperez Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agree, that's what we had before but we would have to keep track of what checksum algorithms are supported for flexible checksums in two places. In HashFunction.isSupportedForFlexibleChecksums & algorithmsSupportedForFlexibleChecksums.

I made a tradeoff here, it seems more risky to possibly miss updating what algorithms we support in both locations than relying on the class name. Plus we have unit tests that should lessen some of our concerns here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other use of algorithmsSupportedForFlexibleChecksums is in a log message. I'd rather remove the log message than keep this class reflection

Comment on lines 6 to 11
@InternalApi
public fun runBlocking(block: suspend () -> Unit) {
runBlocking {
block()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What is this used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so we can have runBlocking as a runtime type, it's used in the presigner generator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add a new runBlocking type under the existing KotlinxCoroutines runtime type

@@ -0,0 +1,40 @@
package aws.smithy.kotlin.runtime.client.config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: missing license at the top. same applies to Concurrency.kt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think we need to adjust our Ktlint for be more stringent when it comes to license headers

}
}

public enum class HttpChecksumConfigOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: missing KDocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a new rule to Ktlint about Kdocs? Not sure it that's possible but I think I'd be helpful

}
}

public enum class HttpChecksumConfigOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: Right now request and response checksum configs both use this enum. In the future, their config options might diverge, which will force us to make an API break. I'd recommend splitting this into HttpChecksumRequestConfigOption and HttpChecksumResponseConfigOption now to get ahead of that.

Copy link
Contributor Author

@0marperez 0marperez Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like just speculation for now, I say we only separate them if that time comes. Changing the type of this config option would be a breaking change for all SDKs. I think it's more likely that we would make additive changes

Copy link
Contributor

@lauzadis lauzadis Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not speculative, there was some discussion during the spec review about adding more options to either request or response but not both.

The spec calls out RequestChecksumCalculation and ResponseChecksumValidation as separate config types, so adding a new config option to either of those wouldn't be a breaking change for SDKs which implement it as the spec says.

This comment has been minimized.

@0marperez 0marperez added acknowledge-artifact-size-increase acknowledge-api-break Acknowledge that a change is API breaking and may be backwards-incompatible. Review carefully! labels Dec 24, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Comment on lines 13 to 28
/**
* Handles checksum calculation so that checksums will be cached during retry loop
*/
@InternalApi
public abstract class AbstractChecksumInterceptor : HttpInterceptor {
private var cachedChecksum: String? = null

override suspend fun modifyBeforeSigning(context: ProtocolRequestInterceptorContext<Any, HttpRequest>): HttpRequest {
cachedChecksum ?: calculateChecksum(context).also { cachedChecksum = it }
return cachedChecksum?.let { applyChecksum(context, it) } ?: context.protocolRequest
cachedChecksum = cachedChecksum ?: calculateChecksum(context)

return if (cachedChecksum != null) {
applyChecksum(context, cachedChecksum!!)
} else {
context.protocolRequest
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: "checksums will be cached during retry loop"...is that what we want? One of the things which could occur between a failed first attempt and a subsequent retry is the body changing because, say, it's a file stream and the underlying file has been modified. In that case, we'd send the old checksum and the new file body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally added for performance reasons. There's a discussion about it here. In your example someone/something would have to modify the underlying file in a file stream between retries. I'm not sure how often that happens for our customers but I'm open to getting rid of it.

}

logger.debug { "Checksum wasn't provided, selected, or isn't required: skipping checksum calculation" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: This message still fires in the case we're doing the chunked streaming hashing but it shouldn't.

Comment on lines +97 to +114
/**
* Determines what checksum algorithm to use, null if none is required
*/
private fun resolveChecksumAlgorithm(
requestChecksumRequired: Boolean,
requestChecksumCalculation: RequestHttpChecksumConfig?,
requestChecksumAlgorithm: String?,
context: ProtocolRequestInterceptorContext<Any, HttpRequest>,
): HashFunction? =
requestChecksumAlgorithm
?.toHashFunctionOrThrow()
?.takeIf { it.isSupportedForFlexibleChecksums }
?: context.defaultChecksumAlgorithmName
?.toHashFunctionOrThrow()
?.takeIf {
(requestChecksumRequired || requestChecksumCalculation == RequestHttpChecksumConfig.WHEN_SUPPORTED) &&
it.isSupportedForFlexibleChecksums
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: Nice fluid/functional style! 🤩

Comment on lines 131 to 135
else -> {
val bodyBytes = req.body.readAll()!!
req.body = bodyBytes.toHttpBody()
val bodyBytes = req.body.readAll() ?: byteArrayOf()
if (req.body.isOneShot) req.body = bodyBytes.toHttpBody()
bodyBytes.hash(checksumAlgorithm).encodeBase64String()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This is an interesting change. We're now replacing the request body only if it's a oneshot body. Replayable (multi-shot?) bodies don't get replaced and so we'll read them again later when transmitting the payload.

In theory that could be slightly more performant if the backing channel/source was already in memory because now we'd no longer be duplicating it. But I think it could also be slightly less performant if the backing channel/source has to be read from IO (disk, network, etc.) since we'd be doing that IO twice.

What led to this change? Was replayable stream performance a factor or was it something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seemed like we could improve performance a bit here but I hadn't considered what you said. I don't have a strong inclination towards keeping the optimization fwiw

Comment on lines 57 to 61
override suspend fun modifyBeforeDeserialization(context: ProtocolResponseInterceptorContext<Any, HttpRequest, HttpResponse>): HttpResponse {
if (!shouldValidateResponseChecksum) {
return context.protocolResponse
}
val logger = coroutineContext.logger<FlexibleChecksumsResponseInterceptor>()

val logger = coroutineContext.logger<FlexibleChecksumsResponseInterceptor<I>>()
val configuredToVerifyChecksum = responseValidationRequired || responseChecksumValidation == ResponseHttpChecksumConfig.WHEN_SUPPORTED
if (!configuredToVerifyChecksum) return context.protocolResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We don't need logger in the case of !configuredToVerifyChecksum and we exit early. We could move the initialization of it closer to the use site.

Comment on lines 70 to 73
if (ignoreChecksum(serviceChecksumValue)) {
logger.info { "Checksum detected but validation was skipped." }
return context.protocolResponse
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Putting the logging statement here means we can't provide more information about why validation was skipped. I think we should log in the subclass (IgnoreCompositeFlexibleChecksumResponseInterceptor) where we can definitely state the reason.

Comment on lines 35 to 43
return when (val body = context.protocolRequest.body) {
is HttpBody.Bytes -> {
checksumAlgorithm.update(
body.readAll() ?: byteArrayOf(),
)
checksumAlgorithm.digest().encodeBase64String()
}
else -> null // TODO: Support other body types
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why the TODO here? Can't we just use HashingSource and HashingByteReadChannel to implement the other body types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can use those types. A small refactor is needed and I think deadlines shifting left me uncertain if I could get this functionality supported on time. I think we might as well do it now since the refactor will be a breaking change

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

@ianbotsf ianbotsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Some minor feedback/suggestions...fix (or don't) and ship!

Comment on lines +136 to +139
/**
* Applies a checksum based on the requirements and limitations of [FlexibleChecksumsRequestInterceptor]
*/
private fun applyFlexibleChecksumsChecksum(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: The names applyFlexibleChecksumsChecksum and calculateFlexibleChecksumsChecksum are pretty unwieldy. Given that these method are already in a class with "flexible checksums" in the name, it seems like we could just stick with the original names applyChecksum and calculateChecksum. Was there a reason to extract private helper methods with longer names here?

(Same question in HttpChecksumRequiredInterceptor.kt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so we can share logic for caching and non-caching checksum calculation. I gave the helper functions longer names to avoid naming conflicts with the superclass functions called applyChecksum and calculateChecksum.

public open fun ignoreChecksum(checksum: String): Boolean = false
public open fun ignoreChecksum(checksum: String, logger: Logger): Boolean = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Instead of passing the logger, we could pass context: ProtocolResponseInterceptorContext<Any, HttpRequest, HttpResponse>. The full context which contains more information and may be useful to future subclasses which need more data to make a determination about ignoring a checksum. The logger would then be available to implementors as context.executionContext.coroutineContext.logger() (or ...coroutineContext.warn { ... }, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, let me do this before I merge

Comment on lines +61 to +64
req.body.contentLength == null && !req.body.isOneShot -> {
val channel = req.body.toSdkByteReadChannel()!!
channel.rollingHash(checksumAlgorithm).encodeBase64String()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why does it matter if req.body.contentLength == null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming it's a requirement for toSdkByteReadChannel and rollingHash. This code is originally from here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was discussed in #772 (comment). That helps add some context.

Copy link

Affected Artifacts

Significantly increased in size

Artifact Pull Request (bytes) Latest Release (bytes) Delta (bytes) Delta (percentage)
smithy-client-jvm.jar 66,614 62,681 3,933 6.27%
http-jvm.jar 108,671 103,477 5,194 5.02%
Changed in size
Artifact Pull Request (bytes) Latest Release (bytes) Delta (bytes) Delta (percentage)
http-client-jvm.jar 317,319 314,811 2,508 0.80%
runtime-core-jvm.jar 815,834 812,469 3,365 0.41%

@0marperez 0marperez dismissed lauzadis’s stale review January 15, 2025 17:35

Other approval available and discussed offline this is ok to merge

@0marperez 0marperez merged commit e0c25d6 into main Jan 15, 2025
16 checks passed
@0marperez 0marperez deleted the flexible-checksums branch January 15, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-api-break Acknowledge that a change is API breaking and may be backwards-incompatible. Review carefully! acknowledge-artifact-size-increase no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants